{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Ungraded Lab: Gradient Descent for Logistic Regression\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Goals\n",
    "In this lab you will:\n",
    "- implement the gradient descent update step for logistic regression.\n",
    "- a version using looping\n",
    "- optionally, a version using matrices"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Dataset \n",
    "Let's start with the same dataset as before."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "X = np.array([[0.5, 1.5], [1,1], [1.5, 0.5], [3, 0.5], [2, 2], [1, 2.5]])\n",
    "y = np.array([0, 0, 0, 1, 1, 1]).reshape(-1,1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As before, we'll use a helper function to plot this data. The data points with label $y=1$ are shown as red crosses, while the data points with label $y=0$ are shown as black circles."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from lab_utils import plot_data\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "plot_data(X,y)\n",
    "\n",
    "# Set both axes to be from 0-6\n",
    "plt.axis([0, 6, 0, 6])\n",
    "# Set the y-axis label\n",
    "plt.ylabel('$x_1$')\n",
    "# Set the x-axis label\n",
    "plt.xlabel('$x_0$')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Logistic Gradient\n",
    "\n",
    " First, you will implement a non-vectorized version of the gradient. Then, you will implement a vectorized version.\n",
    "\n",
    "\n",
    "### Non- vectorized version"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "Recall the gradient descent algorithm utilizes the gradient calculation:\n",
    "$$\\begin{align*}& \\text{repeat until convergence:} \\; \\lbrace \\newline \\; & b := b -  \\alpha \\frac{\\partial J(\\mathbf{w},b)}{\\partial b} \\newline       \\; & w_j := w_j -  \\alpha \\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j} \\tag{1}  \\; & \\text{for j := 0..n-1}\\newline & \\rbrace\\end{align*}$$\n",
    "\n",
    "\n",
    "Where each iteration performs simultaneous updates on $w_j$ for all $j$, where\n",
    "$$\n",
    "\\frac{\\partial J(\\mathbf{w},b)}{\\partial b}  = \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - \\mathbf{y}^{(i)}) \\tag{2}\n",
    "$$\n",
    "$$\n",
    "\\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j}  = \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - \\mathbf{y}^{(i)})x_{j}^{(i)} \\tag{3}\n",
    "$$\n",
    "\n",
    "* m is the number of training examples in the dataset\n",
    "\n",
    "    \n",
    "*  $f_{\\mathbf{w},b}(x^{(i)})$ is the model's prediction, while $y^{(i)}$, which is the actual label\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "* For a logistic regression model for the dataset given above, the model can be representented as:\n",
    "\n",
    "    $f_{\\mathbf{w},b}(x) = g(w_0 + w_1x_1 + w_2x_2)$\n",
    "\n",
    "    where $g(z)$ is the sigmoid function:\n",
    "\n",
    "    $g(z) = \\frac{1}{1+e^{-z}}$ \n",
    "    \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We've implemented the `sigmoid` function for you already and you can simply import and use it, as shown in the code block below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from lab_utils import sigmoid \n",
    "\n",
    "print(sigmoid(0))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### compute_gradient using looping\n",
    "Implement equation (2),(3) above for all $w_j$ and $b$.\n",
    "There are many ways to implement this and you can choose an alternate approach. Outlined below is this:\n",
    "- initialize variables to accumulate dJdw and dJdb\n",
    "- loop over all examples\n",
    "    - calculate the error for that example $g(\\mathbf{x}^{(i)T}\\mathbf{w} + b) - \\mathbf{y}^{(i)}$\n",
    "    - add the error to dJdb (equation 2 above)\n",
    "    - for each input value $x_{j}^{(i)}$ in this example,  \n",
    "        - multiply the error by the input  $x_{j}^{(i)}$, and add to the corresponding element of dJdw. \n",
    "- divide dJdb and dJdw by total number of examples (m)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<details>\n",
    "<summary>\n",
    "    <font size='3', color='darkgreen'><b>Hints</b></font>\n",
    "</summary>\n",
    "\n",
    "```python\n",
    "def compute_gradient_logistic_loop(X, y, w, b): \n",
    "    \"\"\"\n",
    "    Computes the gradient for linear regression \n",
    " \n",
    "    Args:\n",
    "      X : (array_like Shape (m,n)) variable such as house size \n",
    "      y : (array_like Shape (m,1)) actual value \n",
    "      w : (array_like Shape (n,1)) values of parameters of the model      \n",
    "      b : (scalar)                 value of parameter of the model   \n",
    "    Returns\n",
    "      dJdw: (array_like Shape (n,1)) The gradient of the cost w.r.t. the parameters w. \n",
    "      dJdb: (scalar)                The gradient of the cost w.r.t. the parameter b. \n",
    "    \"\"\"\n",
    "    m,n = X.shape\n",
    "    dJdw = np.zeros((n,1))\n",
    "    dJdb = 0.\n",
    "    err  = 0.\n",
    "\n",
    "    ### BEGIN SOLUTION ###\n",
    "    for i in range(m):\n",
    "        err = sigmoid(X[i] @ w + b)  - y[i]    \n",
    "        for j in range(n):\n",
    "            dJdw[j] = dJdw[j] + err * X[i][j]\n",
    "        dJdb = dJdb + err\n",
    "    dJdw = dJdw/m\n",
    "    dJdb = dJdb/m\n",
    "    ### END CODE HERE ###         \n",
    "        \n",
    "    return dJdb[0],dJdw  #index dJdb to return scalar value\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def compute_gradient_logistic_loop(X, y, w, b): \n",
    "    \"\"\"\n",
    "    Computes the gradient for linear regression \n",
    " \n",
    "    Args:\n",
    "      X : (array_like Shape (m,n)) variable such as house size \n",
    "      y : (array_like Shape (m,1)) actual value \n",
    "      w : (array_like Shape (n,1)) values of parameters of the model      \n",
    "      b : (scalar)                 value of parameter of the model   \n",
    "    Returns\n",
    "      dJdw: (array_like Shape (n,1)) The gradient of the cost w.r.t. the parameters w. \n",
    "      dJdb: (scalar)                The gradient of the cost w.r.t. the parameter b. \n",
    "    \"\"\"\n",
    "    m,n = X.shape\n",
    "    dJdw = np.zeros((n,1))\n",
    "    dJdb = 0.\n",
    "    err  = 0.\n",
    "\n",
    "    ### START CODE HERE ### \n",
    "    ### BEGIN SOLUTION ###\n",
    "    for i in range(m):\n",
    "        err = sigmoid(X[i] @ w + b)  - y[i]    \n",
    "        for j in range(n):\n",
    "            dJdw[j] = dJdw[j] + err * X[i][j]\n",
    "        dJdb = dJdb + err\n",
    "    dJdw = dJdw/m\n",
    "    dJdb = dJdb/m\n",
    "    ### END SOLUTION ### \n",
    "    ### END CODE HERE ###         \n",
    "        \n",
    "    return dJdb[0],dJdw  #index dJdb to return scalar value"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "w = np.array([2.,3.]).reshape(-1,1)\n",
    "b = 1.\n",
    "dJdb, dJdw = compute_gradient_logistic_loop(X, y, w, b)\n",
    "print(f\"dJdb, non-vectorized version: {dJdb}\" )\n",
    "print(f\"dJdw, non-vectorized version: {dJdw.tolist()}\" )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Check the implementation of your gradient function using the cell below."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Expected output**\n",
    "\n",
    "``` \n",
    "dJdb, non-vectorized version: 0.49861806546328574\n",
    "dJdw, non-vectorized version: [[0.498333393278696], [0.49883942983996693]]\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### (Optional ) Vectorized version\n",
    "\n",
    "You will now implement a vectorized version of the gradient function.\n",
    "\n",
    "The vectorized version of the gradient formula is \n",
    "\n",
    "$$\\frac{\\partial \\mathbf{J_{w,b}}(\\mathbf{X,y})}{\\partial \\mathbf{b}}= \\frac{1}{m} sum(\\mathbf{f_{w,b}} - \\mathbf{y}) \\tag{4}$$ \n",
    "\n",
    "\n",
    "$$\\nabla_{\\mathbf{w}}\\mathbf{J} = \\frac{1}{m} \\mathbf{X^T}(\\mathbf{f} - \\mathbf{y}) \\tag{5}$$ \n",
    "\n",
    "where\n",
    "\n",
    "$$ \\mathbf{f_{w,b}} = g(\\mathbf{X}  \\mathbf{w})$$\n",
    "\n",
    "As before, $g$ is the sigmoid function"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "**Exercise**\n",
    "\n",
    "You'll complete the vectorized cost function utilizing the equations above. The Hint is available  if you run into difficulties.\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Debugging Tip:** Vectorizing code can sometimes be tricky. One common strategy for debugging is to print out the sizes of the matrices you are working with using the size function. For example, given a data matrix $\\mathbf{X}$ of size 6 × 3 (6 examples, 3 features) and $\\mathbf{w}$, a vector with dimensions 3x1, you can observe that $\\mathbf{Xw}$ is a valid multiplication operation, while $\\mathbf{wX}$ is not."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<details>\n",
    "<summary>\n",
    "    <font size='3', color='darkgreen'><b>Hints</b></font>\n",
    "</summary>\n",
    "\n",
    "```python\n",
    "def compute_gradient_logistic_matrix(X, y, w, b): \n",
    "    \"\"\"\n",
    "    Computes the gradient for linear regression \n",
    " \n",
    "    Args:\n",
    "      X : (array_like Shape (m,n)) variable such as house size \n",
    "      y : (array_like Shape (m,1)) actual value \n",
    "      w : (array_like Shape (n,1)) Values of parameters of the model      \n",
    "      b : (scalar )                Values of parameter of the model      \n",
    "    Returns\n",
    "      dJdw: (array_like Shape (n,1)) The gradient of the cost w.r.t. the parameters w. \n",
    "      dJdb: (scalar)                 The gradient of the cost w.r.t. the parameter b. \n",
    "                                  \n",
    "    \"\"\"\n",
    "    m,n = X.shape\n",
    "    ### START CODE HERE ### \n",
    "    f_wb =  sigmoid(X @ w + b)      \n",
    "    err  = f_wb - y                 \n",
    "    dJdw = (1/m) * (X.T @ err)      \n",
    "    dJdb = (1/m) * np.sum(err)      \n",
    "    ### END CODE HERE ###         \n",
    "        \n",
    "    return dJdb,dJdw\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def compute_gradient_logistic_matrix(X, y, w, b): \n",
    "    \"\"\"\n",
    "    Computes the gradient for linear regression \n",
    " \n",
    "    Args:\n",
    "      X : (array_like Shape (m,n)) variable such as house size \n",
    "      y : (array_like Shape (m,1)) actual value \n",
    "      w : (array_like Shape (n,1)) Values of parameters of the model      \n",
    "      b : (scalar )                Values of parameter of the model      \n",
    "    Returns\n",
    "      dJdw: (array_like Shape (n,1)) The gradient of the cost w.r.t. the parameters w. \n",
    "      dJdb: (scalar)                 The gradient of the cost w.r.t. the parameter b. \n",
    "                                  \n",
    "    \"\"\"\n",
    "    m,n = X.shape\n",
    "    ### START CODE HERE ### \n",
    "    ### BEGIN SOLUTION ###\n",
    "    f_wb =  sigmoid(X @ w + b)      ##None\n",
    "    err  = f_wb - y                 ##None\n",
    "    dJdw = (1/m) * (X.T @ err)      ##None\n",
    "    dJdb = (1/m) * np.sum(err)      ##None\n",
    "    ### END SOLUTION ### \n",
    "    ### END CODE HERE ###         \n",
    "        \n",
    "    return dJdb,dJdw"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's check if the output of this function is equivalent to the output of your non-vectorized implementation above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "w = np.array([2.,3.]).reshape(-1,1)\n",
    "b = 1.\n",
    "dJdb, dJdw = compute_gradient_logistic_loop(X, y, w, b)\n",
    "print(f\"dJdb, non-vectorized version: {dJdb}\" )\n",
    "print(f\"dJdw, non-vectorized version: {dJdw.tolist()}\" )\n",
    "dJdb, dJdw = compute_gradient_logistic_matrix(X, y, w, b)\n",
    "print(f\"dJdb, vectorized version: {dJdb}\" )\n",
    "print(f\"dJdw, vectorized version: {dJdw.tolist()}\" )\n",
    "#print(\"Gradients computed by matrix version: \\n\", compute_gradient_logistic_matrix(X, y, w, b, predict))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Expected output** \n",
    "\n",
    "```\n",
    "dJdb, non-vectorized version: 0.49861806546328574\n",
    "dJdw, non-vectorized version: [[0.498333393278696], [0.49883942983996693]]\n",
    "dJdb, vectorized version: 0.49861806546328574\n",
    "dJdw, vectorized version: [[0.498333393278696], [0.4988394298399669]]\n",
    "```\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
